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We introduce and analyze a model of a multi-directed Eulerian network, that is a directed and 
weighted network where a path exists that passes through all the edges of the network once and only 
once. Networks of this type can be used to describe information networks such as human language 
or DNA chains. We are able to calculate the strength and degree distribution in this network and 
find that they both exhibit a power law with an exponent between 2 and 3. We then analyze the 
■ behavior of the accelerated version of the model and find that the strength distribution has a double 

slope power law behavior. Finally we introduce a non-Eulerian version of the model and find that 
the statistical topological properties remain unchanged. Our analytical results are compared with 
numerical simulations. 

PACS numbers: 89.75.-k, 89.20.Hh, 05.65.+b 
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Many naturally occurring systems appear as chains of repeated elements. Such systems, such as human language, 
DNA chains, etc., often encode and transport information. Markov processes have been adopted to model those 
chains [12J. Unfortunately Markov chains are not able to describe long range correlations that exist within these 
t^j . structures. Thus complex growing networks appear to be a more suitable modeling tool. 

In this paper we study written human language as a complex growing network. Since the discovery by Zipf (IF 



£>Y that language exhibits a complex behavior, and the application of Simon's theories to growing networks[l|, this 
^"i ■ topic has been examined by a number of scientists @, U 0, [l5[ • 

Qh| A useful way to build a network from a text is to associate a vertex to each sign of the text, that is both words 
and punctuation, and to put a link between two vertices if they are adjacent in the text. In a previous paper Q 
| we showed that it is necessary to consider a directed and weighted network to understand the topological properties 
, of this language network, in which the weight of each link in the network represents the number of directed links 
connecting two vertices. Directed links in such a network are necessary since they need to describe systems in which 
os : a syntax is defined and where attachments rules between the objects are not reflexives |8|. 

When networks are built in this way, from a chain of repeated elements, a weighted adjacency matrix is obtained 
that is well known graph in graph theory: the multi- Eulerian graph]^. Eulerian means that there exists a path in 
the graph passing through all the links of the network once and only once, while the prefix:" multi" refers to the fact 
| the adjacency matrix allows multiple links between two vertices. 

In order to describe the evolution of a multi-directed graph we need to introduce the formalism of weighted 
networks [HQ. These are characterized by a weighted adjacency matrix W = {w%j} whose elements Wij represent the 
number of directed links connecting vertex i to vertex j. We define the degree k° ut / m f vertex i as the number of 
out/in- nearest neighbours of vertex i and we have } t ° ut / m = ^\ Q(w.ij/ji — |). We define the out/in-strength s ° ut / m 
r of vertex i as the number of outgoing/incoming links of vertex i, that is s ° ut ' m = y\ Wij/ji- Analytically the Eulerian 
\ condition means that the graph must be connected and it must have s\ n = s° ut for every i. 

In this work we first develop and analyze a model for a general multi-directed Eulerian growing network. Then, 
since human language is an accelerated growing network, we extend our model to its accelerated version, and find 
results similar to those in [f| . More recent works on accelerated growing networks can be find in [T3, EH ■ To conclude 
we introduce and analyze the non-Eulerian version of our model. This last step allows us to build a directed network 
without initial vertex attractiveness. As far as we are aware, this is the first time a model for directed networks has 
been proposed without the help of this ingredient. The resulting power laws exponents, tunable between 2 and 3, are 
very interesting since they fit with those found within most of the real networks l5(. 



II. MODEL A 



First we introduce a model for the multi-directed Eulerian growing network which we will call ModelA. The 
Eulerian condition (hereafter EC) states that every newly introduced edge has to join the last connected vertex, so 
that every newly introduced in-link implies a constrained out-link from the last connected vertex. This is equivalent 



FIG. 1: Growth mechanism for model A with m = 1. Dashed grey arrows represent m + 2 newly introduced edges. 



to say that sf 1 = Ej w ij — Ylj w ji = s i"* = s, for every i, with the global constraint the network must be connected 
(FigfTJ). With the last condition our calculations become easier since we have to consider one quantity, that is s, 
instead of two. 

We start with a chain of 2m connected vertices. At each time step we create a new vertex and m + 2 new directed 
edges (FigfTJ). At each time step 

a- The new vertex will acquire one in- link with the constraint that the network must respect the EC. 

b- The remaining m + 1 in-links will be attached to old vertices with probability proportional to their in-strength 
with the constraint that the network must respect the EC. 

To calculate the strength distribution for the model, we use the fact that with the EC the in-strength will be exactly 
the same as the out-strength distribution. We write the equation for the strength evolution s(t, ti) at time t for the 
vertex born at time U as: 



dt Li s (Mi) 

The right hand side of the last equation takes into account that m + 1 vertices acquire a link with probabil- 
ity proportional to their normalized strength ^j-- Considering that the total number of in/out-links at time t is 
J2i s{t, ti) = (m + 2)t and integrating EqQ] with the initial condition s(t i7 ti) = 1 we obtain 

m+l 

t \ m + 2 

8[t,U) = I ■ 

Using the fact that 



s(t,u) = [r) (2) 



1 dti 

from Eq[2]we obtain: 

P(s,t) = ——s ™+i (4) 
m+l 

which is a stationary power-law distribution with exponent between 2 and 3. In particular it will be 3 for m = 0, and 
it will tend to 2 for increasing values of m. 

In order to calculate the degree distribution we consider that each time the strength of a vertex increases by 1, 
the degree of the vertex increases if and only if the vertex links with a new neighbor. This process implies higher 
order correlations. Wc will approximate this process as an uncorrelated one and compare our results with simulations. 
Hence the equation governing the evolution of the degree is 



dk(t, ti 



1 + m 1 



k(t,U) 



t 



s{t,U 



E *(*.*<)' 



(•5) 



To understand this equation we have to notice that the degree of a vertex grows at a rate proportional to its 
normalized strength, as in EqJ21 but, when the strength of a vertex increases by 1, the probability that the degree of 
the vertex i increases by 1 is (1 — k(t,ti)/t). In fact k(t,ti) is the number of nearest neighbors of vertex i, while t 
represents the total number of vertices at time t. Note that for m = 0, k(t, ti) — s(t, ti) as we would expect. 
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We substitute Eqj2] in Eqj5] and we integrate it to obtain 



k(t,U 



(m + l)m m+1 



t 



m+l 



■ exp 




-(m + l), 



c 



(6) 



where T(a, b) is the incomplete Gamma function and C is an integration constant to be determined by the initial 
conditions fc(tj, £;) = 1. For m = the right hand side of Eq[S]is an indefinite form. Nevertheless, taking the limit for 
m — ► 0, we find again the result of Eq[51 as we predicted. 

We are interested in the behavior of the network for large values of t, so that we expand the first incomplete Gamma 
function for small values of its second argument. Then we take the limit of the expression for t — > oo and obtain 



k{t,U) 



(7) 



Using again EqJ3]for the degree we get 



P(k,t) oc fc~^Vr 

which is again a stationary power-law distribution with exponent between 2 and 3. 



(8) 
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FIG. 2: Results from a 250000 vertices simulation for model A for different values of m. On the left the strength distribution 
is compared with Eq(4] On the right the degree distribution is compared with Eq(8] Simulation results are points while lines 
represent the analytical results. 

To check Eq(6] we integrated it for different values of m and fixed t. This integral represents the number of occupied 
cells of the adjacency matrix and can be compared with results obtained by simulations. The results are shown in 
Figj3] As we can see the uncorrelated approximation is very good for small values of m, but it fails to reproduce the 
behavior of the system for larger values of m, when correlations are stronger. 

In Figl2]we plot the simulations results against EqHJand Eq[8]for different values of m. In the case of the strength 
distribution the goodness of the fit is excellent, while in the case of the degree distribution the approximate result of 
Eq|8] gives just an approximate fit for large values of m, and it is because of the growing strength of correlations in 
the network for large values of m. 
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FIG. 3: Comparison between the numerical integration of Eq[6]and numerical simulations. This integral represents the number 
of occupied cells of the adjacency matrix evaluated for t = 20000 and varying values of m, while the empirical data count the 
effective number of occupied cells. 



III. MODEL B 



In this section we build and analyze a multi-directed accelerated growing Eulerian network that is an accelerated 
version of the previous model and we will call it ModelB. In order to do this we replace the constant addition of m 
edges at each time step with a number of edges m! that grows linearly with time, that is m' — at. In this way at every 
time step we have an increasing number of edges added to the network. The obtained results and the used techniques 
are similar to the ones used in Q . Nevertheless this extension of the previous model is designed to get closer to the 
topology of real language networks, as they display an accelerated evolution, and it is important for completeness in 
the discussion of the subject. 

Keeping this in mind we can describe our modified model. We start with a chain of some connected vertices. At 
each time step we create a new vertex and at + 2 new directed edges (FigJT|). In particular at each time step 

a- The new vertex will acquire one in- link with the constraint the network must follow EC. 

b- The remaining at + 1 in- links will be attached to old vertices with a probability proportional to their in-strength 
with the constraint the network must follow EC. 

The coefficient a will be chosen to fit with that found in real language networks [8] . 
The equation for the strength evolution of the strength of vertex i is 

*glM = (crf + l) . (9) 

* fodt iS (t,ti) 

The right hand side of the last equation takes into account that at + 1 vertices can acquire a link with probability 
proportional to their normalized strength ft ffirj . The integral at the denominator in the right hand side of Eq[9] 

represents the total strength of the network and is \at 2 + 2t. 
Solving EqFJ]with initial condition s(ti,ti) = 1 we obtain 

( A ^ ( at + 4 \ 5 

S (M0= r 7^ ■ (io) 



tij \ ati 

To calculate the strength distribution we use the fact that 



and we get 



pm = -\{ d -^r\ t=s[ , M) (ii) 



P(M) = § (12) 
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FIG. 4: Comparison between the numerical simulation for model B in a network of 50000 vertices and Eg |13l and Eq |14l 



where ti(s, t) is the solution of EgflOl 

This distribution has two regimes separated by a cross-over given approximatively by s cross « (at)^(^t + . 
Below this point Eq ll2l scales with a power law as 

P(s) oc s~§ (13) 

while, for s > s cross , 

P(s) cx s- 3 . (14) 
These results are well confirmed by numerical simulations as shown in Fig[4j 



IV. MODEL C 



To complete this work we introduce a non-Eulerian version of model A and we call it ModelC . 
We start with 2m randomly connected vertices. At each time step we create a new vertex and m + 2 new directed 
edges. In particular at each time step 

a- The new vertex will acquire one in-link and one out-link. 

b- The remaining m+1 out-links will be attached to old vertices with probability proportional to their out-strength. 

c- The remaining m+1 in-links will be attached to old vertices with probability proportional to their out-strength. 

For this model the same equations apply as with the Eulerian Model A with the same arguments, so that it displays 
equivalent topological properties, that is weight, strength and degree distributions. The main difference at this level 
of observation is that in the Eulerian case s m = s out in an exact sense, while in this case this condition holds only on 
average. 



V. CONCLUSIONS 



In this work we contextualize phenomena that manifest as a continuous chain of repeated elements in a novel way, 
within the framework of network theory. We show that such phenomena, such as human language, DNA chains, 
etc., are described by Eulerian graphs. Eulerian graph topology ensures that every newly connected vertex of the 
network is connected to the last linked vertex. So we introduce and analyze different kinds of growing networks built 
to produce an Eulerian graph. We are able to find the main topological properties for this kind of network and we 
find that the resulting exponents for the strength and degree distributions are compatible with those of real networks. 
We then extend our model to a non-Eulerian one. 

It is worth noting that, in the context of the standard network analysis, no striking differences emerge between the 
Eulerian network and its non-Eulerian counterpart. We performed a clustering coefficient analysis, but it was not 
worth showing it, since the differences between the average number of triangles formed in the network in the Eulerian 
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and non-Eulerian case didn't differ significatively. Even a Shannon entropy analysis wouldn't define any relevant 
difference between the two different growing mechanisms, since it is based on the frequency of the elements more than 
on their structural organization. Considering this and the fact that the geometry of the two different networks are so 
dissimilar, dissimilar as a tree and a chain, we would like to emphasize the lack of statistical tools, in network theory, 
to characterize in a significant statistical way the different morphologies of different networks. 

This work is mainly focused on the analysis of written human language, but it is also important for the study 
of directed and weighted growing networks. An important extension of these models, that could be taken into 
consideration for further investigations, is the growth of a network governed by local growing rules. We showed in 
a previous workQ that local growing rules are important to reproduce interesting features of human language and 
must be taken into account to generate a syntax-like structure. 
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